Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pango-assignments work with datadir #443

Closed

Conversation

pvanheus
Copy link
Contributor

@pvanheus pvanheus commented May 4, 2022

This incorporates #430 and also:

  1. Load the pangolin assignment cache from datadir if it is present and newer than the already-installed version
  2. If add-assignment-cache and datadir are both specified, download the assignment cache into datadir
  3. If update or update-data are specified together with datadir, put updated versions of the data in datadir

(it is here as its own PR so that I can check that the CI testing passes)

AngieHinrichs and others added 9 commits April 8, 2022 10:42
…git-lfs.

Up to this point, all data dependencies have been github cov-lineages repositories.  The cache file in pangolin-assignment exceeded the github file size limit so we changed the pangolin-assignment repository to use git-lfs.  Thanks @pvanheus for pointing out that github has storage and bandwidth quotas for Git LFS usage, and that by default the pangolin-assignment release tarball from github does not include the cache file; it can be added to the release tarball, but will count further against the storage and bandwidth quotas.
Since the cache file is generated at UCSC which has ample web server storage and bandwidth, this adds a new mechanism to search for the latest versioned tarball in a web directory (instead of querying the github API), compare its version to the locally installed package if present (using the same pip/__init__.py __version__ mechanism), and install the tarball from the web directory (instead of github).
Note: currently the URL for pangolin-assignment uses the hgdownload-test server; this will need to be changed to hgwdownload after some testing and before release.
…ssignment versions. There may be patch releases that make sense for pangolin-data but not pangolin-assignment (e.g. pangoLEARN patch update), and the suggestion to run --update-data is not helpful because that's how the versions came to be installed in the first place.
Also fix option name typo in github query exception message.
@AngieHinrichs
Copy link
Member

Thanks @pvanheus! I think the datadir problem and the github-vs.-UCSC question are mostly orthogonal, and the datadir problem is more urgent while github-vs.-UCSC is more wait-and-see -- so far there haven't been complaints from cov-lineages about being charged $$ by github, and the UCSC method comes at a cost of complexity in rolling out releases so I'd rather not resort to it unless it becomes necessary.

So... if it's not too much trouble, would it be possible to make a PR against the master branch that doesn't include my #430 changes?

@AngieHinrichs AngieHinrichs marked this pull request as draft May 4, 2022 16:32
@pvanheus pvanheus closed this May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants